Skip to content

feat: subscribe to multiple audio tracks and add RFC 6464 audio level extension#68

Open
martinadamsdev wants to merge 1 commit intolivekit:mainfrom
martinadamsdev:feat/multi-audio-track-subscription
Open

feat: subscribe to multiple audio tracks and add RFC 6464 audio level extension#68
martinadamsdev wants to merge 1 commit intolivekit:mainfrom
martinadamsdev:feat/multi-audio-track-subscription

Conversation

@martinadamsdev
Copy link

@martinadamsdev martinadamsdev commented Feb 7, 2026

Summary

  • Fix multi-device audio: Remove single audio track subscription limit in subscribe_tracks(). Devices in the same room can now hear each other (previously only the first audio track was subscribed, causing device-to-device audio to fail while web clients worked fine).
  • Add RFC 6464 audio level extension: Inject ssrc-audio-level extmap into publisher SDP and RTP packets via esp_peer's RTP transformer API, enabling LiveKit Active Speaker detection.
  • Upgrade esp_peer dependency: ~1.2.7>=1.3.0 (required for esp_peer_set_rtp_transformer API).

Changes

File What changed
engine.c session_state_t now tracks up to 4 subscribed audio track SIDs with deduplication; subscribe_tracks() no longer breaks after first track; handle_join() no longer exits loop on ENGINE_ERR_MAX_SUB
peer.c New sdp_inject_audio_level_extmap() for SDP patching; new RTP transformer callbacks for injecting RFC 5285/6464 audio level header extension; publisher offer SDP is patched before sending
idf_component.yml esp_peer version bumped to >=1.3.0

Test plan

  • Two ESP32 devices (device-a, device-b) + web client join the same room
  • Serial logs show multiple "Subscribing to audio track" lines per device
  • Device-a can hear device-b and vice versa
  • Web client can still hear both devices and be heard by both
  • Serial logs show "Audio level RTP transformer enabled" on both devices

Summary by CodeRabbit

  • New Features

    • Support for subscribing to multiple remote audio tracks per session with duplicate suppression and a configurable per-session limit
    • Publisher-side audio-level RTP header extension injection (RFC 6464) with SDP negotiation to enable per-stream audio level reporting
  • Chores

    • Updated dependency constraint for improved compatibility

@CLAassistant
Copy link

CLAassistant commented Feb 7, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link

coderabbitai bot commented Feb 7, 2026

Warning

Rate limit exceeded

@martinadamsdev has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 26 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 402bab7 and 7809e0c.

📒 Files selected for processing (2)
  • components/livekit/core/peer.c
  • components/livekit/idf_component.yml
📝 Walkthrough

Walkthrough

Adds fixed-size multi-subscription support for audio tracks, implements RFC 6464 audio level RTP header extension injection and transformation for publishers, and updates the espressif/esp_peer dependency constraint.

Changes

Cohort / File(s) Summary
Audio Track Subscription Support
components/livekit/core/engine.c
Replaced single sub_audio_track_sid with sub_audio_track_sids[MAX_AUDIO_SUBSCRIPTIONS] and sub_audio_track_count; added MAX_AUDIO_SUBSCRIPTIONS (4); subscribe_tracks deduplicates, enforces limit, and stores multiple subscriptions; removed prior early-return/break behavior.
RFC 6464 Audio Level Extension
components/livekit/core/peer.c
Added SDP extmap injection for audio media lines, constants for extmap/IDs, and static SDP injector; added RTP transformer callbacks (get_encoded_size, transform) that target Opus (PT 111) and append an 8-byte audio-level extension; registers transformer for publisher peers and logs injection outcomes.
Component Dependency Update
components/livekit/idf_component.yml
Changed espressif/esp_peer constraint from ~1.2.7 to >=1.3.0.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant App as Publisher App
participant Peer as Peer (SDP generator)
participant Transformer as RTP Transformer
participant Remote as Remote Subscriber
App->>Peer: Request offer SDP (publisher)
Peer->>Peer: Inject RFC6464 extmap into audio m= line (if found)
Peer-->>App: Return patched SDP (or original if injection fails)
App->>Transformer: Send outbound RTP packets
Transformer->>Transformer: check PT==111 (Opus) and packet structure
Transformer->>Transformer: append 8-byte audio-level extension, set X-bit
Transformer-->>Remote: Transmit modified RTP packets

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I twitched my nose and counted four,

Subscriptions hopping, not one more,
I stitched a level in each beat,
Opus feet and rhythmic treat,
A rabbit’s cheer for audio lore!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the two main features implemented: multiple audio track subscriptions and RFC 6464 audio level extension support, matching the core changes across engine.c and peer.c.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@martinadamsdev martinadamsdev force-pushed the feat/multi-audio-track-subscription branch from cccb4d7 to 402bab7 Compare February 7, 2026 15:58
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@components/livekit/core/peer.c`:
- Around line 40-44: The code currently injects a hardcoded RTP extmap ID and a
fixed audio level: update the injection logic that uses AUDIO_LEVEL_EXTMAP_LINE
and AUDIO_LEVEL_EXT_ID to first parse the SDP for existing "a=extmap:<id>"
entries and choose an unused numeric ID (or reuse the referred URI) instead of
always using 1 to avoid duplicate extmap IDs; also update the
AUDIO_LEVEL_DEFAULT constant comment and/or its use to clearly mark it as a
placeholder (e.g., "placeholder -30 dBov for testing") and, where RTP audio
level is generated, replace the fixed value with real VAD/level measurement or
call a helper that returns a measured level so active speaker detection reflects
actual speech activity.
- Around line 106-129: The function audio_level_get_encoded_size currently
hardcodes payload type 111 which breaks when Opus is negotiated to a different
dynamic PT; update the transformer to use a negotiated payload type value pulled
from configuration/state instead of 111 — e.g., add a payload_type field to the
transformer context (ctx) or to the peer state and populate it when parsing the
SDP/answer, then replace the literal 111 check in audio_level_get_encoded_size
with that context/state value (use esp_peer_rtp_frame_t and RTP_EXT_BLOCK_SIZE
as before) so the transformer works for any negotiated Opus payload type.
🧹 Nitpick comments (1)
components/livekit/idf_component.yml (1)

15-15: Unbounded version constraint risks pulling in future breaking changes.

Every other dependency in this file uses a ~ (compatible) constraint, but esp_peer now uses >=1.3.0 with no upper bound. A future major version bump (e.g., 2.0) could introduce breaking API changes. Consider using ~1.3.0 (or >=1.3.0, <2.0.0) to stay consistent and safe.

-  espressif/esp_peer: ">=1.3.0"
+  espressif/esp_peer: "~1.3.0"
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b70346 and cccb4d7.

📒 Files selected for processing (3)
  • components/livekit/core/engine.c
  • components/livekit/core/peer.c
  • components/livekit/idf_component.yml
🧰 Additional context used
📓 Path-based instructions (1)
components/**

📄 CodeRabbit inference engine (AGENTS.md)

Place LiveKit ESP32 SDK sources and supporting components in the components/ directory

Files:

  • components/livekit/core/engine.c
  • components/livekit/idf_component.yml
  • components/livekit/core/peer.c
🧠 Learnings (1)
📚 Learning: 2026-01-31T02:18:39.165Z
Learnt from: CR
Repo: livekit/client-sdk-esp32 PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-31T02:18:39.165Z
Learning: Applies to components/** : Place LiveKit ESP32 SDK sources and supporting components in the `components/` directory

Applied to files:

  • components/livekit/idf_component.yml
🧬 Code graph analysis (1)
components/livekit/core/engine.c (1)
components/livekit/core/signaling.c (1)
  • signal_send_update_subscription (420-436)
🔇 Additional comments (7)
components/livekit/core/engine.c (3)

91-98: LGTM!

The fixed-size array with a count field is a clean approach for this constrained environment, and memset in cleanup_previous_connection properly resets the state on reconnect.


163-183: LGTM!

The deduplication logic and bounded array management look correct. The strncmp and strlcpy both properly use sizeof(livekit_pb_sid_t) as the limit.


710-716: LGTM!

Removing the early exit ensures all participants are visited. Once max subscriptions are reached, subsequent subscribe_tracks calls gracefully return after the limit check with minimal overhead.

components/livekit/core/peer.c (4)

51-104: LGTM!

The SDP injection logic correctly scopes its search to the audio media section, handles the boundary properly, and manages memory allocation with correct sizes. The fallback to NULL when no direction attribute is found is appropriate.


131-176: LGTM!

The RFC 5285 one-byte header extension block is correctly constructed: 0xBEDE profile, 1-word length, proper element header with ID and length nibbles, audio level data with voice-activity bit, and padding to word boundary. The payload copy arithmetic is correct.


286-310: LGTM!

The patched SDP path correctly frees the allocated string after forwarding, and the fallback to the original SDP on injection failure is sound.


486-501: LGTM!

The transformer registration is correctly gated on publisher role with audio enabled, and gracefully degrades if the API call fails without blocking peer creation.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@components/livekit/core/peer.c`:
- Around line 139-171: In audio_level_transform, add a defensive NULL check for
frame->encoded_data (enc) before any writes/reads and return a non-zero error
(or propagate frame failure) if it's NULL; ensure you guard the subsequent
memcpy/enc writes and avoid modifying enc[0] etc. If available, log or set an
error code on the frame to aid debugging; keep the check at the top of
audio_level_transform immediately after assigning enc.
🧹 Nitpick comments (3)
components/livekit/core/engine.c (1)

710-716: Verify: subscribe_tracks errors are silently ignored here.

The return value of subscribe_tracks() is now discarded. This is fine for ENGINE_ERR_MAX_SUB (subsequent participants' tracks just log a warning), but ENGINE_ERR_INVALID_ARG would also be silently swallowed. In practice tracks comes from protobuf data that could have tracks_count > 0 with a valid pointer, so this is low risk, but worth being aware of.

components/livekit/core/peer.c (2)

51-104: SDP attribute search doesn't enforce line-boundary matching.

strstr(m_audio, "a=sendrecv") could match a substring within another attribute value (e.g., inside an a=fmtp: line). In practice SDP attributes are well-formed so this is very unlikely, but using a line-boundary-aware search (e.g., searching for "\r\na=sendrecv" or "\na=sendrecv") would be more robust.

Also, if the audio section uses a=inactive, the function returns NULL and falls back to the original SDP, which is reasonable.


290-310: Full SDP logged at INFO level may be noisy in production.

Lines 297 and 302 log the entire SDP string with ESP_LOGI. On embedded targets with limited serial bandwidth, consider using ESP_LOGD (debug level) to match the pattern used elsewhere in this file for verbose output.

♻️ Suggested change
-                    ESP_LOGI(TAG(peer), "Generated offer (with audio-level extmap):\n%s", patched_sdp);
+                    ESP_LOGD(TAG(peer), "Generated offer (with audio-level extmap):\n%s", patched_sdp);
-                    ESP_LOGI(TAG(peer), "Generated offer:\n%s", sdp);
+                    ESP_LOGD(TAG(peer), "Generated offer:\n%s", sdp);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cccb4d7 and 402bab7.

📒 Files selected for processing (3)
  • components/livekit/core/engine.c
  • components/livekit/core/peer.c
  • components/livekit/idf_component.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/livekit/idf_component.yml
🧰 Additional context used
📓 Path-based instructions (1)
components/**

📄 CodeRabbit inference engine (AGENTS.md)

Place LiveKit ESP32 SDK sources and supporting components in the components/ directory

Files:

  • components/livekit/core/peer.c
  • components/livekit/core/engine.c
🧬 Code graph analysis (1)
components/livekit/core/engine.c (1)
components/livekit/core/signaling.c (1)
  • signal_send_update_subscription (420-436)
🔇 Additional comments (3)
components/livekit/core/engine.c (2)

91-98: LGTM — clean fixed-size multi-subscription state.

The MAX_AUDIO_SUBSCRIPTIONS limit with a parallel count field is a straightforward approach for this embedded context. memset(&eng->session, 0, ...) in cleanup_previous_connection (Line 766) correctly resets both the array and the count on reconnect.


153-186: LGTM — deduplication and subscription logic is correct.

The dedup check before the capacity check is properly ordered (avoids counting an already-subscribed track against the limit). strlcpy safely handles SIDs that might exceed the buffer. One minor note: when ENGINE_ERR_MAX_SUB is returned at Line 177, any remaining tracks in the list (even if they'd be deduplicated/skipped) are not iterated, but since only audio tracks are relevant and the limit is legitimately reached, this is acceptable behavior.

components/livekit/core/peer.c (1)

486-501: LGTM — transformer registration is properly guarded.

Only registered for publisher role with a non-NONE audio codec. Failure is logged as a warning without blocking peer creation, which is the right trade-off for an optional feature.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@ladvoc
Copy link
Collaborator

ladvoc commented Feb 10, 2026

Hi @martinadamsdev, thanks for your contribution! Since this requires a minor version bump of esp_peer, I’ve opened a separate PR for that. Once that is merged, we can move forward with this PR.

@martinadamsdev martinadamsdev force-pushed the feat/multi-audio-track-subscription branch from 7809e0c to df19489 Compare February 11, 2026 11:38
… extension

When multiple devices join the same LiveKit room, they cannot hear each
other because subscribe_tracks() only subscribes to the first audio track
and breaks. This removes the single-track limitation, allowing subscription
to up to MAX_AUDIO_SUBSCRIPTIONS (4) audio tracks with deduplication.

Additionally, injects RFC 6464 ssrc-audio-level RTP header extension into
publisher SDP and outgoing RTP packets to support LiveKit Active Speaker
detection. Requires esp_peer >= 1.3.0 for the RTP transformer API.
@martinadamsdev martinadamsdev force-pushed the feat/multi-audio-track-subscription branch from df19489 to ecb1557 Compare February 11, 2026 11:40
@martinadamsdev
Copy link
Author

Hi @ladvoc, thanks for splitting out the esp_peer upgrade into #70!

Now that #70 is merged, I've rebased this PR onto the latest main and removed the idf_component.yml change (since it's already handled). The diff is now limited to engine.c and peer.c only.

Ready for your review when you get a chance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants